feat: specify retry configuration overrides per subgraph#2172
feat: specify retry configuration overrides per subgraph#2172
Conversation
|
Note Reviews pausedUse the following commands to manage reviews:
WalkthroughReplace primitive multi-argument subgraph-retry configuration with a structured, per-subgraph, expression-driven system: add SubgraphRetryOptions/NewSubgraphRetryOptions, a RetryExpressionManager, and an internal retry Manager; wire them into transports, graph server, router config, circuit breaker, traceclient, and tests. Changes
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Comment |
Router-nonroot image scan passed✅ No security vulnerabilities found in image: |
f6f2adc to
ef2a9a6
Compare
8a7da2f to
d86e5cc
Compare
There was a problem hiding this comment.
Actionable comments posted: 10
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
router/internal/retrytransport/retry_transport.go (1)
142-144: Honor context cancellation during backoff sleep.Abort retries promptly if the request context is canceled.
- // Wait for the specified duration - time.Sleep(sleepDuration) + // Wait for the specified duration or abort if canceled + select { + case <-req.Context().Done(): + return resp, req.Context().Err() + case <-time.After(sleepDuration): + }
🧹 Nitpick comments (25)
router-tests/structured_logging_test.go (1)
712-712: Consistent migration to object-based retry options in testsAll updated
WithSubgraphRetryOptions(core.NewSubgraphRetryOptions(... Enabled: false ...))calls look correct and keep logs deterministic by avoiding subgraph retries.If you find yourself disabling retries across many tests, consider a small helper in testenv (e.g.,
testenv.DisableSubgraphRetriesOption()) to reduce repetition and ease future changes.Also applies to: 832-832, 964-964, 1100-1100, 2214-2214
router-tests/telemetry/telemetry_test.go (2)
8695-8700: Per-subgraph retry builder usage looks correct; optionally assert “no retries” explicitlySwitching to
core.WithSubgraphRetryOptions(core.NewSubgraphRetryOptions(...))withEnabled: falsematches the new API and aligns with our prior learning that algorithms shouldn’t be validated when retries are disabled. If you want a stronger assertion, consider wiring anOnRetryFunc(if exposed onSubgraphRetryOptions) to increment a counter and assert it remains zero during this test.
9639-9642: Check all datapoints, not just the first, for absence of the subgraph attributeGuard against future exporter changes that may add more datapoints by asserting across the full set.
- data2 := httpRequestsMetric.Data.(metricdata.Sum[int64]) - atts := data2.DataPoints[0].Attributes - _, ok := atts.Value(attribute.Key(key)) - require.False(t, ok) + data2 := httpRequestsMetric.Data.(metricdata.Sum[int64]) + for _, dp := range data2.DataPoints { + _, ok := dp.Attributes.Value(attribute.Key(key)) + require.False(t, ok) + }router-tests/error_handling_test.go (1)
1383-1387: Good migration to manager-based retry config; consider a helper to DRY testsThe switch to core.WithSubgraphRetryOptions(core.NewSubgraphRetryOptions(...)) with Enabled:false is correct and matches our prior learning that disabled retries shouldn’t validate algorithm/fields. You repeat the same disabled config 4x; a tiny helper keeps tests concise.
Apply this to replace the repeated blocks:
- core.WithSubgraphRetryOptions(core.NewSubgraphRetryOptions(config.TrafficShapingRules{ - All: config.GlobalSubgraphRequestRule{ - BackoffJitterRetry: config.BackoffJitterRetry{Enabled: false}, - }, - })), + disabledSubgraphRetries(),Add this helper near the top of the file (outside the selected ranges):
// test helper func disabledSubgraphRetries() core.Option { return core.WithSubgraphRetryOptions(core.NewSubgraphRetryOptions(config.TrafficShapingRules{ All: config.GlobalSubgraphRequestRule{ BackoffJitterRetry: config.BackoffJitterRetry{Enabled: false}, }, })) }Also applies to: 1419-1423, 1455-1459, 1491-1495
router/core/graph_server.go (1)
266-287: router/core/graph_server.go: guard nil retryOptions and wrap Initialize error- if s.retryOptions.IsEnabled() { + if s.retryOptions != nil && s.retryOptions.IsEnabled() { retryExprManager := expr.NewRetryExpressionManager() // … err = retryManager.Initialize( s.retryOptions.All, s.retryOptions.SubgraphMap, routerConfig.GetSubgraphs(), ) - if err != nil { - return nil, err - } + if err != nil { + return nil, fmt.Errorf("failed to initialize retry manager: %w", err) + } s.retryManager = retryManager }router/internal/expr/retry_expression.go (3)
16-23: Constructor is fine; optional: precompile default expressionAs a small UX improvement, consider precompiling the default expression during construction so first evaluations don’t depend on prior AddExpression calls.
Example (if you decide to change signature elsewhere to allow error return):
m := &RetryExpressionManager{expressionMap: make(map[string]*vm.Program)} _ = m.AddExpression("") // ignore error; Initialize can still validate real strings return m
25-50: Normalize expression key or fix the commentComment says “normalized” but no normalization occurs. Either trim/normalize or update the comment.
import ( "fmt" "reflect" + "strings" "github.com/expr-lang/expr" "github.com/expr-lang/expr/vm" ) @@ -func (c *RetryExpressionManager) AddExpression(exprString string) error { - expression := exprString +func (c *RetryExpressionManager) AddExpression(exprString string) error { + expression := strings.TrimSpace(exprString) @@ - // Use the normalized expression string as the key for deduplication + // Use the trimmed expression string as the key for deduplication c.expressionMap[expression] = program return nil }
53-80: Consider lazy compilation on cache miss instead of silently returning falseFail-closed is safe, but lazily compiling unknown expressions can reduce surprises if Initialize missed one.
func (m *RetryExpressionManager) ShouldRetry(ctx RetryContext, expressionString string) (bool, error) { if m == nil { return false, nil } - expression := expressionString + expression := expressionString if expression == "" { expression = defaultRetryExpression } program, ok := m.expressionMap[expression] if !ok { - // If the expression wasn't pre-compiled, do not retry by default - return false, nil + // Lazily compile the expression once + if err := m.AddExpression(expression); err != nil { + return false, err + } + program = m.expressionMap[expression] }If AddExpression can be invoked concurrently in your environment, add an RWMutex around expressionMap accesses.
Would you like me to add a minimal, lock-guarded version?
router/internal/expr/retry_expression_test.go (2)
141-146: Strengthen the empty-expression testConsider asserting runtime behavior as well (e.g., ShouldRetry returns false) to guard against regressions, not just that AddExpression("") succeeds and the manager is non-nil.
err := manager.AddExpression("") assert.NoError(t, err) assert.NotNil(t, manager) +res, evalErr := manager.ShouldRetry(RetryContext{}, "") +assert.NoError(t, evalErr) +assert.False(t, res)
528-551: Parallelize subtests to speed up and isolate failures (optional)You can mark the top-level test and subtests with t.Parallel() for faster runs; table-driven subtests here are good candidates.
func TestRetryExpressionManager_WithSyscallErrors(t *testing.T) { + t.Parallel() t.Run("expression combining status and syscall errors", func(t *testing.T) { + t.Parallel()router/core/router.go (3)
187-206: SubgraphRetryOptions: shape and IsEnabled() look goodStructure and enablement logic are sound; nil map iteration is safe. Minor: consider doc comments on fields for clarity (global “All” vs per-subgraph overrides).
1908-1923: Comment mismatch: “circuit breakers” → “retry options”The comment under NewSubgraphRetryOptions mentions circuit breakers; it should refer to retry options.
- // Subgraph specific circuit breakers + // Subgraph-specific retry options
1924-1937: Default algorithm only when retries are enabled (aligns with team preference)Per team learning, when retries are disabled we shouldn’t “enforce” algorithm defaults. Guard the defaulting by Enabled to better reflect that stance.
-func newRetryConfig(config config.BackoffJitterRetry) retrytransport.RetryOptions { - algorithm := config.Algorithm - if algorithm == "" { - algorithm = retrytransport.BackoffJitter - } +func newRetryConfig(config config.BackoffJitterRetry) retrytransport.RetryOptions { + algorithm := config.Algorithm + if config.Enabled && algorithm == "" { + algorithm = retrytransport.BackoffJitter + } return retrytransport.RetryOptions{ Enabled: config.Enabled, Algorithm: algorithm, MaxRetryCount: config.MaxAttempts, MaxDuration: config.MaxDuration, Interval: config.Interval, Expression: config.Expression, } }router/core/retry_builder.go (1)
15-23: Defensive nil check for req (optional)If ShouldRetry ever gets a nil req from the transport layer, req.Context() will panic. Cheap guard:
-reqContext := getRequestContext(req.Context()) +if req == nil { + return false +} +reqContext := getRequestContext(req.Context())router-tests/retry_test.go (1)
76-78: Avoid brittle full-JSON equality in assertions.These assertions will break on harmless message changes. Prefer checking essential fields (e.g., statusCode) or comparing decoded JSON structures.
Example:- require.Equal(t, `{"errors":[{"message":"Failed to fetch from Subgraph 'employees', Reason: empty response.","extensions":{"statusCode":502}}],"data":{"employees":null}}`, res.Body) + require.JSONEq(t, `{"data":{"employees":null}}`, res.Body) + require.Contains(t, res.Body, `"statusCode":502`)Also applies to: 127-129, 180-182, 296-298, 360-372
router/internal/retrytransport/retry_transport_test.go (2)
268-298: Add a zero-retries guard test.Add a case with MaxRetryCount=0 to assert no retries occur and OnRetry isn’t called.
Sketch:
t.Run("no retries when MaxRetryCount=0", func(t *testing.T) { retries, attempts := 0, 0 mgr := newTestManager(simpleShouldRetry, func(int, *http.Request, *http.Response, time.Duration, error) { retries++ }, RetryOptions{Enabled: true, MaxRetryCount: 0, Interval: time.Millisecond, MaxDuration: time.Second}, "sg") tr := NewRetryHTTPTransport(&MockTransport{handler: func(*http.Request) (*http.Response, error) { attempts++; return nil, errors.New("fail") }}, func(*http.Request) *zap.Logger { return zap.NewNop() }, mgr) _, err := tr.RoundTrip(withSubgraph(httptest.NewRequest("GET", "http://x", nil), "sg")) assert.Error(t, err) assert.Equal(t, 0, retries) assert.Equal(t, 1, attempts) })
383-414: Log-string assertions are inherently brittle.If wording changes, tests fail. Consider matching on a stable key/field or use zap observer to assert fields instead of formatted strings.
router/core/retry_builder_test.go (4)
52-57: Test name doesn’t match behavior.No “disabled” path is exercised here; the test only builds the function. Rename to “build function returns non-nil with empty manager” or add a real disabled-case.
-t.Run("build function when retry is disabled", func(t *testing.T) { +t.Run("build function returns non-nil with empty manager", func(t *testing.T) {
176-189: Clarify EOF comment.Comment says “EOF is now handled at transport layer,” but this test expects true here via default retryable error handling. Tweak the comment to avoid confusion.
- // EOF is now handled at transport layer, not expression + // EOF can be retried via default transport-level detection; expression is false here
442-485: Remove stale, commented-out tests.Dead code makes maintenance harder. Either port to the new manager-based API or delete.
-// TODO: Fix -//func TestProcessRetryOptions(t *testing.T) { -// ... (entire commented block) -//} +// (removed legacy ProcessRetryOptions tests; superseded by expression-manager and manager-based flows)
51-57: Add a “no request context” guard test.BuildRetryFunction returns false when request context is missing. Add a unit test to lock this in.
Sketch:
t.Run("returns false when request context missing", func(t *testing.T) { manager := expr.NewRetryExpressionManager() _ = manager.AddExpression("true") fn, _ := BuildRetryFunction(manager) req, _ := http.NewRequest("POST", "http://example.com/graphql", nil) // no ctx attached assert.False(t, fn(nil, req, &http.Response{StatusCode: 500}, "true")) })Want me to open a follow-up PR adding this?
Also applies to: 139-146
router/internal/retrytransport/manager.go (2)
88-91: Remove unreachable config-not-found branch.customSgNames is only populated when a subgraph has an entry in subgraphRetryOptions; this branch cannot trigger.
- entry, ok := subgraphRetryOptions[sgName] - if !ok { - joinErr = errors.Join(joinErr, errors.New("retry config not found for subgraph "+sgName)) - continue - } + entry := subgraphRetryOptions[sgName]
118-121: Rename local variable for clarity.Avoid “circuitBreaker” in retry code.
- if circuitBreaker, ok := c.retries[name]; ok { - return circuitBreaker + if opts, ok := c.retries[name]; ok { + return optsrouter/internal/retrytransport/retry_transport.go (2)
145-151: Option: drain before sleeping to maximize connection reuse.Draining immediately frees the connection for reuse while we wait.
- // drain the previous response before retrying - rt.drainBody(resp, requestLogger) - - // Retry the request - resp, err = rt.roundTripper.RoundTrip(req) + // drain the previous response before waiting/retrying + rt.drainBody(resp, requestLogger) + // Retry the request + resp, err = rt.roundTripper.RoundTrip(req)
50-56: Log level nit: malformed Retry-After.Consider Warn instead of Error to reduce noise for client-driven header issues.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (17)
router-tests/error_handling_test.go(4 hunks)router-tests/panic_test.go(2 hunks)router-tests/retry_test.go(10 hunks)router-tests/structured_logging_test.go(5 hunks)router-tests/telemetry/telemetry_test.go(2 hunks)router/core/graph_server.go(4 hunks)router/core/retry_builder.go(2 hunks)router/core/retry_builder_test.go(15 hunks)router/core/router.go(3 hunks)router/core/router_config.go(2 hunks)router/core/supervisor_instance.go(1 hunks)router/core/transport.go(6 hunks)router/internal/expr/retry_expression.go(2 hunks)router/internal/expr/retry_expression_test.go(2 hunks)router/internal/retrytransport/manager.go(1 hunks)router/internal/retrytransport/retry_transport.go(4 hunks)router/internal/retrytransport/retry_transport_test.go(33 hunks)
🧰 Additional context used
🧠 Learnings (8)
📓 Common learnings
Learnt from: SkArchon
PR: wundergraph/cosmo#2167
File: router-tests/structured_logging_test.go:712-713
Timestamp: 2025-08-28T10:07:27.637Z
Learning: In the router retry configuration, when retries are disabled (enabled=false in WithSubgraphRetryOptions), the algorithm parameter should not be validated since it won't be used. Empty algorithm strings are acceptable when retries are disabled.
📚 Learning: 2025-08-20T10:08:17.857Z
Learnt from: endigma
PR: wundergraph/cosmo#2155
File: router/core/router.go:1857-1866
Timestamp: 2025-08-20T10:08:17.857Z
Learning: router/pkg/config/config.schema.json forbids null values for traffic_shaping.subgraphs: additionalProperties references $defs.traffic_shaping_subgraph_request_rule with type "object". Therefore, in core.NewSubgraphTransportOptions, dereferencing each subgraph rule pointer is safe under schema-validated configs, and a nil-check is unnecessary.
Applied to files:
router/core/supervisor_instance.gorouter/core/router.gorouter-tests/telemetry/telemetry_test.gorouter/core/router_config.gorouter-tests/retry_test.gorouter-tests/error_handling_test.gorouter-tests/structured_logging_test.gorouter-tests/panic_test.go
📚 Learning: 2025-08-28T10:07:27.637Z
Learnt from: SkArchon
PR: wundergraph/cosmo#2167
File: router-tests/structured_logging_test.go:712-713
Timestamp: 2025-08-28T10:07:27.637Z
Learning: In the router retry configuration, when retries are disabled (enabled=false in WithSubgraphRetryOptions), the algorithm parameter should not be validated since it won't be used. Empty algorithm strings are acceptable when retries are disabled.
Applied to files:
router/core/supervisor_instance.gorouter/core/router.gorouter-tests/retry_test.go
📚 Learning: 2025-08-20T10:08:17.857Z
Learnt from: endigma
PR: wundergraph/cosmo#2155
File: router/core/router.go:1857-1866
Timestamp: 2025-08-20T10:08:17.857Z
Learning: In the Cosmo router codebase, JSON schema validation prevents null values in TrafficShapingRules subgraph configurations, making nil checks unnecessary when dereferencing subgraph rule pointers in NewSubgraphTransportOptions.
Applied to files:
router-tests/telemetry/telemetry_test.gorouter-tests/error_handling_test.gorouter-tests/structured_logging_test.gorouter-tests/panic_test.go
📚 Learning: 2025-08-14T16:47:03.744Z
Learnt from: SkArchon
PR: wundergraph/cosmo#2137
File: router/pkg/pubsub/redis/adapter.go:114-116
Timestamp: 2025-08-14T16:47:03.744Z
Learning: In the Cosmo router codebase, EventMetricStore fields are never nil - the system uses the null object pattern with NewNoopEventMetricStore() when metrics are disabled, eliminating the need for nil checks before calling EventMetricStore methods.
Applied to files:
router-tests/telemetry/telemetry_test.go
📚 Learning: 2025-08-26T17:30:41.212Z
Learnt from: SkArchon
PR: wundergraph/cosmo#2167
File: router/core/retry_expression_builder.go:3-11
Timestamp: 2025-08-26T17:30:41.212Z
Learning: In router/core/retry_expression_builder.go, the team prefers to keep EOF error detection simple with string matching ("unexpected eof") rather than using errors.Is() for io.EOF and io.ErrUnexpectedEOF. They want to wait for customer feedback before making it more robust, since customers can modify retry expressions to handle specific cases.
Applied to files:
router/internal/expr/retry_expression_test.gorouter/core/retry_builder_test.gorouter/core/retry_builder.go
📚 Learning: 2025-08-28T09:59:19.956Z
Learnt from: SkArchon
PR: wundergraph/cosmo#2167
File: router/core/supervisor_instance.go:196-204
Timestamp: 2025-08-28T09:59:19.956Z
Learning: In router/pkg/config/config.go, the BackoffJitterRetry struct uses envDefault tags to provide default values for Algorithm ("backoff_jitter") and Expression ("IsRetryableStatusCode() || IsConnectionError() || IsTimeout()"), ensuring that even when YAML configs omit these fields, valid defaults are applied automatically, preventing ProcessRetryOptions from rejecting empty values.
Applied to files:
router/core/retry_builder.go
📚 Learning: 2025-08-26T17:19:28.178Z
Learnt from: SkArchon
PR: wundergraph/cosmo#2167
File: router/internal/expr/retry_context.go:3-10
Timestamp: 2025-08-26T17:19:28.178Z
Learning: In the Cosmo router codebase, prefer using netErr.Timeout() checks over explicit context.DeadlineExceeded checks for timeout detection in retry logic, as Go's networking stack typically wraps context deadline exceeded errors in proper net.Error implementations.
Applied to files:
router/internal/retrytransport/retry_transport_test.go
🧬 Code graph analysis (17)
router/core/supervisor_instance.go (1)
router/core/router.go (2)
WithSubgraphRetryOptions(1776-1780)NewSubgraphRetryOptions(1908-1922)
router/core/router.go (2)
router/internal/retrytransport/manager.go (3)
RetryOptions(25-32)OnRetryFunc(17-17)BackoffJitter(22-22)router/pkg/config/config.go (2)
TrafficShapingRules(164-171)BackoffJitterRetry(231-238)
router-tests/telemetry/telemetry_test.go (2)
router/core/router.go (2)
WithSubgraphRetryOptions(1776-1780)NewSubgraphRetryOptions(1908-1922)router/pkg/config/config.go (3)
TrafficShapingRules(164-171)GlobalSubgraphRequestRule(190-207)BackoffJitterRetry(231-238)
router/internal/expr/retry_expression_test.go (2)
router/internal/expr/retry_expression.go (1)
NewRetryExpressionManager(19-23)router/internal/expr/retry_context.go (1)
LoadRetryContext(132-146)
router/core/router_config.go (1)
router/core/router.go (1)
SubgraphRetryOptions(187-191)
router/internal/expr/retry_expression.go (1)
router/internal/expr/retry_context.go (1)
RetryContext(13-19)
router-tests/retry_test.go (4)
router/core/router.go (3)
NewSubgraphRetryOptions(1908-1922)WithSubgraphRetryOptions(1776-1780)Option(172-172)router/pkg/config/config.go (4)
TrafficShapingRules(164-171)GlobalSubgraphRequestRule(190-207)BackoffJitterRetry(231-238)Config(985-1059)router/internal/retrytransport/manager.go (1)
OnRetryFunc(17-17)router-tests/testenv/testenv.go (4)
Run(105-122)RunWithError(135-150)Environment(1727-1763)SubgraphsConfig(365-378)
router-tests/error_handling_test.go (2)
router/core/router.go (2)
WithSubgraphRetryOptions(1776-1780)NewSubgraphRetryOptions(1908-1922)router/pkg/config/config.go (3)
TrafficShapingRules(164-171)GlobalSubgraphRequestRule(190-207)BackoffJitterRetry(231-238)
router/core/graph_server.go (4)
router/internal/retrytransport/manager.go (3)
Manager(34-40)NewManager(42-49)OnRetryFunc(17-17)router/internal/expr/expr_manager.go (1)
Manager(15-17)router/internal/expr/retry_expression.go (1)
NewRetryExpressionManager(19-23)router/core/retry_builder.go (1)
BuildRetryFunction(13-46)
router-tests/structured_logging_test.go (2)
router/core/router.go (2)
WithSubgraphRetryOptions(1776-1780)NewSubgraphRetryOptions(1908-1922)router/pkg/config/config.go (3)
TrafficShapingRules(164-171)GlobalSubgraphRequestRule(190-207)BackoffJitterRetry(231-238)
router/internal/retrytransport/manager.go (2)
router/internal/expr/expr.go (1)
Request(65-74)router/internal/expr/retry_expression.go (1)
RetryExpressionManager(12-14)
router/core/retry_builder_test.go (3)
router/internal/expr/retry_expression.go (1)
NewRetryExpressionManager(19-23)router/core/retry_builder.go (1)
BuildRetryFunction(13-46)router/core/context.go (2)
OperationTypeQuery(488-488)OperationTypeMutation(489-489)
router/internal/retrytransport/retry_transport.go (3)
router/internal/retrytransport/manager.go (1)
Manager(34-40)router/internal/expr/expr.go (2)
Request(65-74)Context(34-38)router/internal/context/keys.go (1)
CurrentSubgraphContextKey(7-7)
router/core/retry_builder.go (2)
router/internal/expr/retry_expression.go (1)
RetryExpressionManager(12-14)router/internal/retrytransport/manager.go (1)
ShouldRetryFunc(16-16)
router/internal/retrytransport/retry_transport_test.go (5)
router/internal/expr/expr.go (2)
Request(65-74)Context(34-38)router/internal/context/keys.go (1)
CurrentSubgraphContextKey(7-7)router/internal/retrytransport/manager.go (4)
OnRetryFunc(17-17)RetryOptions(25-32)Manager(34-40)NewManager(42-49)router/internal/expr/retry_expression.go (1)
NewRetryExpressionManager(19-23)router/internal/retrytransport/retry_transport.go (1)
NewRetryHTTPTransport(77-87)
router-tests/panic_test.go (2)
router/core/router.go (2)
WithSubgraphRetryOptions(1776-1780)NewSubgraphRetryOptions(1908-1922)router/pkg/config/config.go (3)
TrafficShapingRules(164-171)GlobalSubgraphRequestRule(190-207)BackoffJitterRetry(231-238)
router/core/transport.go (2)
router/internal/retrytransport/manager.go (1)
Manager(34-40)router/internal/retrytransport/retry_transport.go (1)
NewRetryHTTPTransport(77-87)
🔇 Additional comments (23)
router/core/router_config.go (1)
79-79: Switch to pointer: ensure initialization path or guard against nilChanging
retryOptionsto*SubgraphRetryOptionsis fine, but it increases the chance of nil deref at use sites unless always initialized.Please confirm that
core.Config.retryOptionsis assigned during router setup. Currently,WithSubgraphRetryOptionssetsr.retryOptions(see router/core/router.go), but I don’t see wherec.retryOptions(this struct) is populated. If not populated, prefer guarding inUsage()(see below) or setting it alongside the router field.router/core/supervisor_instance.go (1)
196-197: Guard disabled subgraph retries before defaulting algorithm
In NewSubgraphRetryOptions (router/core/router.go ¶1909–1921), you always call newRetryConfig for each subgraph, which unconditionally defaults an empty algorithm—even when Enabled==false. Per our retry-disabled rule, skip both the call and algorithm default for subgraphs with v.BackoffJitterRetry.Enabled==false. Wrap the loop entry in an if v.BackoffJitterRetry.Enabled check so disabled rules retain an empty algorithm.⛔ Skipped due to learnings
Learnt from: SkArchon PR: wundergraph/cosmo#2167 File: router-tests/structured_logging_test.go:712-713 Timestamp: 2025-08-28T10:07:27.637Z Learning: In the router retry configuration, when retries are disabled (enabled=false in WithSubgraphRetryOptions), the algorithm parameter should not be validated since it won't be used. Empty algorithm strings are acceptable when retries are disabled.Learnt from: endigma PR: wundergraph/cosmo#2155 File: router/core/router.go:1857-1866 Timestamp: 2025-08-20T10:08:17.857Z Learning: router/pkg/config/config.schema.json forbids null values for traffic_shaping.subgraphs: additionalProperties references $defs.traffic_shaping_subgraph_request_rule with type "object". Therefore, in core.NewSubgraphTransportOptions, dereferencing each subgraph rule pointer is safe under schema-validated configs, and a nil-check is unnecessary.Learnt from: endigma PR: wundergraph/cosmo#2155 File: router/core/router.go:1857-1866 Timestamp: 2025-08-20T10:08:17.857Z Learning: In the Cosmo router codebase, JSON schema validation prevents null values in TrafficShapingRules subgraph configurations, making nil checks unnecessary when dereferencing subgraph rule pointers in NewSubgraphTransportOptions.Learnt from: SkArchon PR: wundergraph/cosmo#2167 File: router/core/supervisor_instance.go:196-204 Timestamp: 2025-08-28T09:59:19.956Z Learning: In router/pkg/config/config.go, the BackoffJitterRetry struct uses envDefault tags to provide default values for Algorithm ("backoff_jitter") and Expression ("IsRetryableStatusCode() || IsConnectionError() || IsTimeout()"), ensuring that even when YAML configs omit these fields, valid defaults are applied automatically, preventing ProcessRetryOptions from rejecting empty values.router-tests/panic_test.go (1)
51-55: Good: retries explicitly disabled for panic-path stabilitySwitching to
core.NewSubgraphRetryOptions(... Enabled: false ...)ensures the router won’t add subgraph retries that could mask panic timing in these tests. Matches the disabled-validation expectation.Also applies to: 87-91
router/core/graph_server.go (3)
43-43: Import looks correctretrytransport is used below; import is appropriate.
104-105: Field addition LGTMStoring the retry manager on graphServer is fine and consistent with how other managers (circuit) are handled.
1208-1209: Wiring RetryManager into transport options is correctExecutorConfigurationBuilder receives the manager; assuming transport handles nil, this is safe.
If not already covered in tests, add one integration that omits WithSubgraphRetryOptions entirely to ensure nil RetryManager is handled gracefully.
router/internal/expr/retry_expression_test.go (2)
125-135: Manager-based API usage looks correctInstantiating the manager, pre-registering expressions with AddExpression, and evaluating via ShouldRetry(ctx, expr) matches the new API. Good assertions on compile-time errors for invalid expressions.
496-525: Good syscall coverage with the manager pathNice coverage for ECONNREFUSED/ECONNRESET/ETIMEDOUT under the manager-evaluated expression. This ensures helper wiring remains intact after the API shift.
router/core/router.go (1)
1776-1779: ConfirmedretryOptionsfield exists on Router.Config
Router embedsConfig, which declaresretryOptions *SubgraphRetryOptions, so theWithSubgraphRetryOptionssetter compiles as expected.router/core/retry_builder.go (1)
55-58: EOF-as-default-retryable matches stated team preferenceKeeping this simple with string matching for “unexpected eof” is consistent with prior guidance to wait for customer feedback before broadening detection.
router-tests/retry_test.go (4)
38-51: Good adoption of builder-style retry options.Switching to NewSubgraphRetryOptions + WithSubgraphRetryOptions makes tests clearer and mirrors the new API.
247-306: Exact sleep equality on Retry-After is OK here.Asserting sleepDuration equals header seconds is deterministic since jitter shouldn’t apply when Retry-After is honored.
649-674: Great coverage of mixed global + per-subgraph overrides.Validates default-from-All and targeted override behavior cleanly.
Also applies to: 676-719
363-371: Unable to run flakiness check in sandbox
Thego testinvocation failed due to missing Go module context. Please run locally and confirm stability:go test ./router-tests -run 'TestFlakyRetry/verify max duration is not exceeded on intervals$' -count=50 -vIf you observe intermittent failures, remove
t.Parallel()from this subtest or widen the timing tolerances around the expected duration.router/internal/retrytransport/retry_transport_test.go (3)
58-61: withSubgraph helper looks correct.Uses the typed context key; safe and explicit.
204-231: Solid body-draining tests.Covers read and close failure paths with log assertions. Good observability checks.
Also applies to: 300-357
483-538: Thorough 429 coverage (seconds, HTTP-date, invalid, negative, mixed).Great breadth; confirms shouldUseRetryAfter behavior and fallbacks.
Also applies to: 540-596, 598-636, 638-697, 699-741, 742-782, 784-863
router/core/retry_builder_test.go (3)
60-85: Default expression behavior assertions look good.Covers 5xx and default network errors.
241-270: Mutation short-circuit behavior well covered.Ensures expressions can’t override the “no retry on mutations” rule.
390-439: Comprehensive positive matrix + mutation negative matrix.Good end-to-end validation of the expression path.
router/internal/retrytransport/manager.go (1)
61-69: LGTM on skipping algorithm validation when retries are disabled.Disabled per-subgraph entries are excluded from validation, aligning with the prior decision captured in learnings.
router/core/transport.go (2)
105-107: Wiring to Manager-based retry looks correct.Conditional wrapping with NewRetryHTTPTransport is clean and preserves the base round tripper path when disabled.
73-75: All call sites updated. Verified NewCustomTransport/NewRetryHTTPTransport and RetryManager usage in router/core/transport.go (lines 106, 367, 388, 428) and router/core/graph_server.go (line 1208).
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
router/internal/retrytransport/retry_transport.go (1)
145-151: Rewind request body before retrying; otherwise subsequent attempts may send an empty/partial body.GraphQL subgraph calls are typically POST with bodies. Without resetting Body via GetBody, retries can be broken.
// drain the previous response before retrying rt.drainBody(resp, requestLogger) // Retry the request - resp, err = rt.roundTripper.RoundTrip(req) + // Reset request body if necessary + if req.Body != nil && req.Body != http.NoBody { + if req.GetBody == nil { + requestLogger.Warn("Cannot retry request with non-rewindable body") + break + } + newBody, gbErr := req.GetBody() + if gbErr != nil { + requestLogger.Error("Failed to reset request body for retry", zap.Error(gbErr)) + return resp, gbErr + } + req.Body = newBody + } + resp, err = rt.roundTripper.RoundTrip(req)
♻️ Duplicate comments (3)
router-tests/retry_test.go (3)
503-531: Great: base-level invalid algorithm is correctly rejected at init.
Matches expected behavior and error text.
533-558: Nice: invalid algorithm is ignored when retries are disabled (base).
This addresses the previously requested coverage.
560-587: Nice: per-subgraph “disabled ignores algorithm” case covered.
Prevents regressions in validation gating.
🧹 Nitpick comments (14)
router-tests/retry_test.go (8)
38-51: DRY the options setup with a small helper to reduce duplication.
Multiple tests repeat the same NewSubgraphRetryOptions + WithSubgraphRetryOptions pattern; a local builder will simplify maintenance.Apply this helper once near the top and replace repeated blocks:
+func withRetryOpts(cfg config.TrafficShapingRules, onRetry func(int, *http.Request, *http.Response, time.Duration, error)) core.Option { + opts := core.NewSubgraphRetryOptions(cfg) + opts.OnRetryFunc = onRetry + return core.WithSubgraphRetryOptions(opts) +}Also applies to: 92-105, 145-158, 199-212, 260-273, 324-337, 391-404, 451-464
224-231: Decouple success trigger from OnRetry callback semantics.
Using onRetryCounter to flip success couples the test to callback timing. Drive success via request attempt count to avoid false positives if OnRetry invocation timing changes.- if onRetryCounter.Load() == int32(maxAttemptsBeforeServiceSucceeds) { + if serviceCallsCounter.Load() == int32(maxAttemptsBeforeServiceSucceeds) { w.WriteHeader(http.StatusOK) _, _ = w.Write([]byte(`{"data":{"employees":[{"id":1},{"id":2}]}}`)) } else { w.WriteHeader(http.StatusBadGateway) } serviceCallsCounter.Add(1)
256-257: Speed up the 429 + Retry-After test.
This test currently sleeps ~3s (3 retries × 1s). One retry is sufficient to validate header precedence.- maxRetryCount := 3 + maxRetryCount := 1
433-434: Avoid flakiness: assert a range instead of NotEqual.
Exact equality with retryInterval can sporadically occur depending on jitter/rounding.- require.NotEqual(t, sleepDuration.Load(), int64(retryInterval)) + d := time.Duration(sleepDuration.Load()) + require.Greater(t, d, time.Duration(0)) + require.Less(t, d, 2*retryInterval)
494-495: Same here: use bounded checks to reduce flakiness.
Zero Retry-After should fall back to jitter/backoff; prefer range assertions.- require.NotEqual(t, sleepDuration.Load(), int64(retryInterval)) + d := time.Duration(sleepDuration.Load()) + require.Greater(t, d, time.Duration(0)) + require.Less(t, d, 2*retryInterval)
360-362: Prefer JSON structural equality over string equality for response bodies.
String comparisons are brittle to formatting/ordering changes.- require.Equal(t, `{"errors":[{"message":"Failed to fetch from Subgraph 'employees', Reason: empty response.","extensions":{"statusCode":502}}],"data":{"employees":null}}`, res.Body) + require.JSONEq(t, `{"errors":[{"message":"Failed to fetch from Subgraph 'employees', Reason: empty response.","extensions":{"statusCode":502}}],"data":{"employees":null}}`, res.Body)(Apply similarly to the other exact body checks.)
Also applies to: 681-685, 761-763
366-372: Widen timing slack to reduce CI flakes.
A 20ms/100ms margin can be tight under load; slightly larger buffers help.- shouldBeLessThanDuration := (time.Duration(maxRetryCount-1) * retryInterval) - (20 * time.Millisecond) + shouldBeLessThanDuration := (time.Duration(maxRetryCount-1) * retryInterval) - (50 * time.Millisecond) ... - expectedMinDuration := (time.Duration(maxRetryCount-1) * maxDuration) - (100 * time.Millisecond) + expectedMinDuration := (time.Duration(maxRetryCount-1) * maxDuration) - (150 * time.Millisecond)
621-697: Per-subgraph retry counts validated correctly.
Isolated counters for employees vs test1 look good.If useful, I can add a combined query hitting both subgraphs in a single operation to verify independent retry accounting within one request.
router/internal/retrytransport/retry_transport.go (6)
142-144: Honor context cancellation/deadlines during backoff sleep.Avoid sleeping past ctx deadlines; exit early if canceled.
- // Wait for the specified duration - time.Sleep(sleepDuration) + // Wait for the specified duration or context cancellation + if sleepDuration > 0 { + select { + case <-req.Context().Done(): + return resp, req.Context().Err() + case <-time.After(sleepDuration): + } + }
59-66: Also honor Retry-After for 503 Service Unavailable.RFC 7231 allows Retry-After on 503 responses as well as 429. This improves interoperability with upstreams.
- if resp == nil || resp.StatusCode != http.StatusTooManyRequests { + if resp == nil || (resp.StatusCode != http.StatusTooManyRequests && resp.StatusCode != http.StatusServiceUnavailable) { return 0, false }
124-128: Generic log for Retry-After usage (it may be 429 or 503).- requestLogger.Debug("Using Retry-After header for 429 response", + requestLogger.Debug("Using Retry-After header", zap.Int("retry", retries), zap.String("url", req.URL.String()), zap.Duration("retry-after", sleepDuration), )
51-53: Lower log level for malformed Retry-After headers.Header parsing errors are client-observable but non-fatal; debug (or warn) is less noisy in production.
- logger.Error("Failed to parse Retry-After header", zap.String("retry-after", retryAfter), zap.Error(errJoin)) + logger.Debug("Failed to parse Retry-After header", zap.String("retry-after", retryAfter), zap.Error(errJoin))
173-176: Tighten the comment wording.- // When we close the body only will go internally marks the persisted connection as true - // which is important so that it can reuse the connection internally for retrying + // Draining then closing marks the connection as reusable, allowing the client to reuse it for retries.
105-111: Optional: short-circuit when retries are effectively disabled.If MaxRetryCount == 0 (or options are disabled), return immediately to avoid backoff setup and logging overhead. Note: per team learning, IsEnabled() handles nil receivers safely if available.
retryOptions := rt.retryManager.GetSubgraphOptions(subgraph) - if retryOptions == nil { - return resp, nil - } + if retryOptions == nil { + return resp, err + } + // Optional fast-exit: + // if retryOptions.MaxRetryCount == 0 { return resp, err } + // or if available: + // if !retryOptions.IsEnabled() { return resp, err }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
router-tests/retry_test.go(10 hunks)router/internal/retrytransport/manager.go(1 hunks)router/internal/retrytransport/retry_transport.go(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- router/internal/retrytransport/manager.go
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-09-01T13:44:13.033Z
Learnt from: SkArchon
PR: wundergraph/cosmo#2172
File: router/core/router_config.go:221-221
Timestamp: 2025-09-01T13:44:13.033Z
Learning: The IsEnabled() method on SubgraphRetryOptions in router/core/router.go safely handles nil receivers by returning false as the first check, so no external nil checks are needed when calling c.retryOptions.IsEnabled(). This is a consistent pattern used across multiple IsEnabled() methods in the codebase.
Applied to files:
router-tests/retry_test.gorouter/internal/retrytransport/retry_transport.go
📚 Learning: 2025-08-20T10:08:17.857Z
Learnt from: endigma
PR: wundergraph/cosmo#2155
File: router/core/router.go:1857-1866
Timestamp: 2025-08-20T10:08:17.857Z
Learning: router/pkg/config/config.schema.json forbids null values for traffic_shaping.subgraphs: additionalProperties references $defs.traffic_shaping_subgraph_request_rule with type "object". Therefore, in core.NewSubgraphTransportOptions, dereferencing each subgraph rule pointer is safe under schema-validated configs, and a nil-check is unnecessary.
Applied to files:
router-tests/retry_test.go
📚 Learning: 2025-09-01T13:44:13.033Z
Learnt from: SkArchon
PR: wundergraph/cosmo#2172
File: router/core/router_config.go:221-221
Timestamp: 2025-09-01T13:44:13.033Z
Learning: The IsEnabled() method on SubgraphRetryOptions in router/core/router.go safely handles nil receivers by returning false, so no external nil checks are needed when calling c.retryOptions.IsEnabled().
Applied to files:
router-tests/retry_test.gorouter/internal/retrytransport/retry_transport.go
📚 Learning: 2025-08-28T10:07:27.637Z
Learnt from: SkArchon
PR: wundergraph/cosmo#2167
File: router-tests/structured_logging_test.go:712-713
Timestamp: 2025-08-28T10:07:27.637Z
Learning: In the router retry configuration, when retries are disabled (enabled=false in WithSubgraphRetryOptions), the algorithm parameter should not be validated since it won't be used. Empty algorithm strings are acceptable when retries are disabled.
Applied to files:
router-tests/retry_test.go
🧬 Code graph analysis (2)
router-tests/retry_test.go (4)
router/core/router.go (3)
NewSubgraphRetryOptions(1908-1922)WithSubgraphRetryOptions(1776-1780)Option(172-172)router/pkg/config/config.go (4)
TrafficShapingRules(164-171)GlobalSubgraphRequestRule(190-207)BackoffJitterRetry(231-238)Config(985-1059)router/internal/retrytransport/manager.go (1)
OnRetryFunc(17-17)router-tests/testenv/testenv.go (4)
Run(105-122)RunWithError(135-150)Environment(1727-1763)SubgraphsConfig(365-378)
router/internal/retrytransport/retry_transport.go (3)
router/internal/retrytransport/manager.go (1)
Manager(34-40)router/internal/expr/expr.go (2)
Request(65-74)Context(34-38)router/internal/context/keys.go (1)
CurrentSubgraphContextKey(7-7)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: build-router
- GitHub Check: build_push_image
- GitHub Check: integration_test (./events)
- GitHub Check: build_push_image (nonroot)
- GitHub Check: image_scan (nonroot)
- GitHub Check: build_test
- GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
- GitHub Check: integration_test (./telemetry)
- GitHub Check: image_scan
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Analyze (go)
🔇 Additional comments (4)
router-tests/retry_test.go (4)
10-13: Good import adjustments for new API usage.
The added imports align with the builder-style options and testify assertions.
16-23: OnRetry counter helper is concise and thread-safe.
Atomic counters and nil-guard for duration are correct.
589-619: Per-subgraph invalid algorithm correctly fails fast.
Solid negative test to mirror the base-level case.
699-775: Mixed global + per-subgraph overrides behave as intended.
Employees inherit global, test1 overrides; counters confirm isolation.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
router/internal/retrytransport/retry_transport.go (1)
147-152: Critical: retries will resend a consumed request bodyOn retry, req.Body is already consumed. If req.GetBody is unset, retries are unsafe; if set, reset Body before each retry. Otherwise you’ll send an empty body or fail.
Apply:
- // Wait for the specified duration - time.Sleep(sleepDuration) - - // drain the previous response before retrying - rt.drainBody(resp, requestLogger) + // Drain previous response ASAP to free the connection for reuse + rt.drainBody(resp, requestLogger) + + // Respect request cancellation while waiting + if sleepDuration > 0 { + timer := time.NewTimer(sleepDuration) + select { + case <-req.Context().Done(): + timer.Stop() + return resp, req.Context().Err() + case <-timer.C: + } + } + + // Rewind request body if present + if req.Body != nil { + if req.GetBody == nil { + requestLogger.Debug("cannot retry: non-rewindable request body") + break + } + if newBody, gErr := req.GetBody(); gErr != nil { + return resp, gErr + } else { + req.Body = newBody + } + } // Retry the request resp, err = rt.roundTripper.RoundTrip(req)
♻️ Duplicate comments (1)
router/internal/retrytransport/retry_transport.go (1)
104-108: Nice fix: avoids a second request when no retry optionsReturning the outcome of the first attempt preserves semantics and errors.
🧹 Nitpick comments (5)
router/internal/retrytransport/retry_transport.go (5)
58-75: Honor Retry-After for 503 as well and accept zero-second delaysRFC 7231 defines Retry-After for 503 and 429. Also treat “0” as a valid immediate retry (no sleep), not a fallback to random backoff.
Apply:
-func shouldUseRetryAfter(logger *zap.Logger, resp *http.Response, maxDuration time.Duration) (time.Duration, bool) { - if resp == nil || resp.StatusCode != http.StatusTooManyRequests { +func shouldUseRetryAfter(logger *zap.Logger, resp *http.Response, maxDuration time.Duration) (time.Duration, bool) { + if resp == nil || (resp.StatusCode != http.StatusTooManyRequests && resp.StatusCode != http.StatusServiceUnavailable) { return 0, false } @@ - return duration, duration > 0 + // Accept immediate retry (0s) as valid + return duration, duration >= 0
22-56: Minor: trim header value before parsing; lower log level on parse failuresReal-world proxies sometimes add whitespace; parsing “ 120 ” should work. Also consider debug level to reduce noise.
Apply:
import ( "errors" "io" "net/http" + "strings" "strconv" "time" @@ -func parseRetryAfterHeader(logger *zap.Logger, retryAfter string) time.Duration { +func parseRetryAfterHeader(logger *zap.Logger, retryAfter string) time.Duration { if retryAfter == "" { return 0 } var errJoin error - seconds, err := strconv.Atoi(retryAfter) + retryAfter = strings.TrimSpace(retryAfter) + seconds, err := strconv.Atoi(retryAfter) @@ - if errJoin != nil { - logger.Error("Failed to parse Retry-After header", zap.String("retry-after", retryAfter), zap.Error(errJoin)) - } + if errJoin != nil { + logger.Debug("Failed to parse Retry-After header", zap.String("retry-after", retryAfter), zap.Error(errJoin)) + }
144-149: Drain before sleeping to free connections earlierDraining before the wait reduces connection pinning during long sleeps.
(Already included in the diff above.)
183-187: Guard against nil resp in helpers (defensive)Not strictly necessary given current call sites, but adding a nil check makes the helper safer.
Apply:
func isResponseOK(resp *http.Response) bool { + if resp == nil { + return false + } // Ensure we don't wait for no reason when subgraphs don't behave // spec-compliant and returns a different status code than 200. return resp.StatusCode >= 200 && resp.StatusCode < 300 }
175-181: Polish the comment grammarImprove clarity of the connection reuse note.
Apply:
- // When we close the body only will go internally marks the persisted connection as true - // which is important so that it can reuse the connection internally for retrying + // Fully consuming and closing the body marks the connection reusable + // so the Transport can keep-alive and reuse it for retries.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
router/internal/retrytransport/manager.go(1 hunks)router/internal/retrytransport/retry_transport.go(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- router/internal/retrytransport/manager.go
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: SkArchon
PR: wundergraph/cosmo#2167
File: router-tests/structured_logging_test.go:712-713
Timestamp: 2025-08-28T10:07:27.637Z
Learning: In the router retry configuration, when retries are disabled (enabled=false in WithSubgraphRetryOptions), the algorithm parameter should not be validated since it won't be used. Empty algorithm strings are acceptable when retries are disabled.
📚 Learning: 2025-09-01T13:44:13.033Z
Learnt from: SkArchon
PR: wundergraph/cosmo#2172
File: router/core/router_config.go:221-221
Timestamp: 2025-09-01T13:44:13.033Z
Learning: The IsEnabled() method on SubgraphRetryOptions in router/core/router.go safely handles nil receivers by returning false as the first check, so no external nil checks are needed when calling c.retryOptions.IsEnabled(). This is a consistent pattern used across multiple IsEnabled() methods in the codebase.
Applied to files:
router/internal/retrytransport/retry_transport.go
📚 Learning: 2025-09-01T13:44:13.033Z
Learnt from: SkArchon
PR: wundergraph/cosmo#2172
File: router/core/router_config.go:221-221
Timestamp: 2025-09-01T13:44:13.033Z
Learning: The IsEnabled() method on SubgraphRetryOptions in router/core/router.go safely handles nil receivers by returning false, so no external nil checks are needed when calling c.retryOptions.IsEnabled().
Applied to files:
router/internal/retrytransport/retry_transport.go
📚 Learning: 2025-09-01T13:26:16.245Z
Learnt from: SkArchon
PR: wundergraph/cosmo#2172
File: router/core/retry_builder.go:13-15
Timestamp: 2025-09-01T13:26:16.245Z
Learning: In router/core/retry_builder.go, the BuildRetryFunction's manager parameter is guaranteed to be non-nil by the calling code, so defensive nil checks are not needed according to the team's preference.
Applied to files:
router/internal/retrytransport/retry_transport.go
📚 Learning: 2025-08-20T10:08:17.857Z
Learnt from: endigma
PR: wundergraph/cosmo#2155
File: router/core/router.go:1857-1866
Timestamp: 2025-08-20T10:08:17.857Z
Learning: router/pkg/config/config.schema.json forbids null values for traffic_shaping.subgraphs: additionalProperties references $defs.traffic_shaping_subgraph_request_rule with type "object". Therefore, in core.NewSubgraphTransportOptions, dereferencing each subgraph rule pointer is safe under schema-validated configs, and a nil-check is unnecessary.
Applied to files:
router/internal/retrytransport/retry_transport.go
🧬 Code graph analysis (1)
router/internal/retrytransport/retry_transport.go (3)
router/internal/retrytransport/manager.go (1)
Manager(34-40)router/internal/expr/expr.go (2)
Request(65-74)Context(34-38)router/internal/context/keys.go (1)
CurrentSubgraphContextKey(7-7)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: build-router
- GitHub Check: Analyze (go)
- GitHub Check: image_scan (nonroot)
- GitHub Check: build_push_image (nonroot)
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: image_scan
- GitHub Check: build_push_image
- GitHub Check: integration_test (./events)
- GitHub Check: integration_test (./telemetry)
- GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
- GitHub Check: build_test
🔇 Additional comments (5)
router/internal/retrytransport/retry_transport.go (5)
16-20: Good: transport now composes roundTripper and retryManager cleanlyField injection improves testability and decoupling.
77-87: Constructor wiring looks solidSignature change and assignments are consistent.
139-142: OnRetry hook usage is correctAPI matches Manager.OnRetry(retries, req, resp, sleepDuration, err).
110-111: Usage correct: backoff.New expects (max, interval) signature
The argumentsretryOptions.MaxDuration(max) andretryOptions.Interval(interval) align withfunc New(max time.Duration, interval time.Duration) *Backoff; no changes needed.
96-103: VerifyCurrentSubgraphContextKeyvalue type
Ensure that any value stored underrcontext.CurrentSubgraphContextKeyis always astring; adjust the type assertion if other types are ever used.
|
@coderabbitai pause |
✅ Actions performedReviews paused. |
|
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
|
Closed as inactive. Feel free to reopen if this PR is still being worked on. |
|
@StarpTech Could I get a re-review (remaining on my side to fix circuit breaker test) |
|
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
|
Closed as inactive. Feel free to reopen if this PR is still being worked on. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2172 +/- ##
==========================================
+ Coverage 63.46% 65.82% +2.36%
==========================================
Files 251 255 +4
Lines 26767 26482 -285
==========================================
+ Hits 16987 17432 +445
+ Misses 8416 7652 -764
- Partials 1364 1398 +34
🚀 New features to boost your workflow:
|
82c297e to
4f52652
Compare
Fixes ENG-7966
This PR adds the option for users to set retry settings per subgraph. If a subgraph retry is specified for all, it will be used by default, with any subgraph overrides either disabling or changing the settings per subgraph. Example configuration
This PR also adds a fix for upgrade requests for circuit breakers, as the engine hooks context key with the subgraph name is not set on upgrade requests.
Summary by CodeRabbit
New Features
Refactor
Tests
Checklist